-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable dispatch to tinygemm int4 and int8 kernels for quantized tensor #230
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/230
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 2a8dc5d with merge base e7bbbd2 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
waiting for #227 to be landed to actually test it |
torchao/quantization/subclass.py
Outdated
@@ -832,7 +836,54 @@ def __torch_dispatch__(cls, func, types, args, kwargs): | |||
args[1], | |||
None if len(args) == 2 else args[2], | |||
) | |||
if weight_qtensor.input_quant_func is not None: | |||
if weight_qtensor.input_quant_func is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's please remove input_quant_func
and discuss using AffineQuantizedTensor
as an organizing principle :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is used for testing 8da4w right now, can this be done later until there is a better alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting request changes just so we don't forget to remove input_quant_func
will do in next PR |
torchao/quantization/subclass.py
Outdated
|
||
|
||
# TODO: add padding support | ||
class TinygemmAffineQuantizedTensor(AffineQuantizedTensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is red and let's please not add a TinygemmAffineQuantizedTensor
sorry, CI should be fixed now. For tinygemm, it's because it is quantizing things differently, we need to think of how to unify this |
b5950a3
to
449a5d8
Compare
61befaa
to
943bf13
Compare
…ed tensor Summary: This adds some dispatch to the tinygemm kernels for cuda, although need to resolve implementation mismatch problem for tinygemm first Test Plan: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int4 python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8 Reviewers: Subscribers: Tasks: Tags:
@@ -50,14 +49,14 @@ def _apply_dynamic_quant(model): | |||
""" | |||
_replace_with_custom_fn_if_matches_filter( | |||
model, | |||
lambda linear_mod: dynamic_quant(linear_mod, (torch.randn(1, linear_mod.in_features))), | |||
lambda linear_mod: dynamic_quant(linear_mod, (torch.randn(1, linear_mod.in_features),)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Why is that extra comma needed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By all means don't be blocked on this comment haha
self.assertTrue(torch.equal(scale, scale_ref)) | ||
torch.testing.assert_close(zero_point_float, zero_point_ref, rtol=0.00001, atol=torch.max(scale)*0.03) | ||
self.assertTrue(torch.equal(zero_point, zero_point_ref)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat
pytorch#230) Summary: This adds some dispatch to the tinygemm kernels for cuda, although need to resolve implementation mismatch problem for tinygemm first Test Plan: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int4 python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8 Reviewers: Subscribers: Tasks: Tags:
pytorch#230) Summary: This adds some dispatch to the tinygemm kernels for cuda, although need to resolve implementation mismatch problem for tinygemm first Test Plan: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int4 python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8 Reviewers: Subscribers: Tasks: Tags:
Summary:
This adds some dispatch to the tinygemm kernels for cuda, although need to resolve implementation mismatch problem for tinygemm first
Test Plan:
TODO
Reviewers:
Subscribers:
Tasks:
Tags: